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SOME RESULTS ON THE REGULARIZATION OF LSQR FOR 
LARGE-SCALE DISCRETE ILL-POSED PROBLEMS* 

YI HUANGt AND ZHONGXIAO JIA* 

Abstract. LSQR, a Lanczos bidiagonalization based Krylov subspace iterative method, and its 
mathematically equivalent CGLS applied to normal equations system, are commonly used for large- 
scale discrete ill-posed problems. It is well known that LSQR and CGLS have regularizing effects, 
where the number of iterations plays the role of the regularization parameter. However, it has long 
been unknown whether the regularizing effects are good enough to find best possible regularized 
solutions. Here a best possible regularized solution means that it is at least as accurate as the best 
regularized solution obtained by the truncated singular value decomposition (TSVD) method. In 
this paper, we establish bounds for the distance between the /c-dimensional Krylov subspace and 
the /c-dimensional dominant right singular space. They show that the Krylov subspace captures the 
dominant right singular space better for severely and moderately ill-posed problems than for mildly 
ill-posed problems. Our general conclusions are that LSQR has better regularizing effects for the first 
two kinds of problems than for the third kind, and a hybrid LSQR with additional regularization 
is generally needed for mildly ill-posed problems. Exploiting the established bounds, we derive 
an estimate for the accuracy of the rank k approximation generated by Lanczos bidiagonalization. 
Numerical experiments illustrate that the regularizing effects of LSQR are good enough to compute 
best possible regularized solutions for severely and moderately ill-posed problems, stronger than 
our theory predicts, but they are not for mildly ill-posed problems and additional regularization is 
needed. 
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1. Introduction. We consider the iterative solution of large-scale discrete ill- 
posed problems 

(1.1) min IIAcc-&II, A G 6 € R™, m>n, 

where the norm || • || is the 2-norm of a vector or matrix, and the matrix A is extremely 
ill conditioned with its singular values decaying gradually to zero without a noticeable 
gap. This kind of problem arises in many science and engineering areas, such as 
signal processing and image restoration, typically when discretizing Fredholm integral 
equations of the first-kind [20, 22]. In particular, the right-hand side b is affected by 
noise, caused by measurement or discretization errors, i.e., 

b = b + e, 

where e G R™ represents the Gaussian white noise vector and b G R™ denotes the 
noise-free right-hand side, and it is supposed that ||e|| < ||6||. Because of the presence 
of noise e in 6 and the ill-conditioning of A, the naive solution Xnaive = A^b of (1.1) 
is meaningless and far from the true solution Xtrue = Am, where the superscript f 
denotes the Moore-Penrose generalized inverse of a matrix. Therefore, it is necessary 
to use regularization to determine a best possible approximation to Xtme = A'^b 
[14, 18, 20, 22]. 
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The solution of (1.1) can be analyzed by the SVD of A: 


( 1 . 2 ) A = 

where U = (ui,U 2 ,..., Um) & and V = {vi,V 2 , ■ ■ ■, Vn) G are orthogonal 

matrices, and the entries of the diagonal matrix E = diag((7i, a 2 , ■ ■ ■, (Jn) G R”^" are 
the singular values of A, which are assumed to be simple throughout the paper and 
labelled in decreasing order ai > a 2 > ■ ■ ■ > crn > 0. With (1.2), we obtain 
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Throughout the paper, we assume that b satisfies the discrete Picard condition: On 
average, the coefficients \ ufb\ decay faster than the singular values. To be definitive, 
for simplicity we assume that these coefficients satisfy a widely used model in the 
literature, e.g., [20, p. 81, 111 and 153] and [22, p. 68]: 

(1.4) j ujb j= (jI~^^, /3 > 0, i = 1,2,..., n. 

Let fco be the transition point such that \u1^b\ > and \u1^_^_ib\ < [u^g+iej 

[22, p. 98]. Then the TSVD method computes 


TSVD 


= 


E uj b 
2=1 

2 = 1 


E uj b 
2=1 

2—1 2 —fco + l 


k < kg] 


k > ko, 


which can be written as = ^4^6, the solution of the modihed problem that 

replaces A by its best rank k approximation Ak = Uk^kV^ in (1.1), where Uk = 
(ui,..., Mfc), Vfc = (ill,..., life) and Efc = diag(cri,..., ak). Remarkably, is 

the minimum-norm least squares solution of the perturbed problem that replaces A 
in (1.1) by its best rank k approximation Ak, and the best possible TSVD solution 
of (1.1) by the TSVD method is x^^^^ [22, p. 98]. A number of approaches have 
been proposed for determining fcg, such as discrepancy principle, discrete L-curve and 
generalized cross validation; see, e.g., [1, 2, 20, 27, 35] for comparisons of the classical 
and new ones. In our numerical experiments, we use the L-curve criterion in the 
TSVD method and hybrid LSQR. The TSVD method has been widely studied; see, 
e.g., [4, 20, 22, 29]. 

For a small and moderate (1.1), the TSVD method has been used as a general- 
purpose reliable and efficient numerical method for solving (1.1). As a result, we will 
take the TSVD solution x^^^^ as a standard reference when assessing the regularizing 
effects of iterative solvers and accuracy of iterates under consideration in this paper. 

As well known, it is generally not feasible to compute SVD when (1.1) is large. 
In this case, one typically projects (I.l) onto a sequence of low dimensional Krylov 
subspaces and gets a sequence of iterative solutions [18, 20, 22, 37]. The Conjugate 
Gradient (CG) method has been used when A is symmetric definite [18]. As a CG- 
type method applied to the semidefinite linear system Ax = 6 or the normal equations 
system Ax = A^b, the CGLS algorithm has been studied; see [6, 20, 22] and the 
references therein. The LSQR algorithm [33], which is mathematically equivalent to 
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CGLS, has attracted great attention, and is known to have regularizing effects and 
exhibits semi-convergence (see [20, p. 135], [22, p. 110], and also [5, 19, 23, 32]): The 
iterates tend to be better and better approximations to the exact solution Xtrue and 
their norms increase slowly and the residual norms decrease. In later stages, however, 
the noise e starts to deteriorate the iterates, so that they will start to diverge from 
Xtrue and instead converge to the naive solution Xnaive, while their norms increase 
considerably and the residual norms stabilize. Such phenomenon is due to the fact that 
a projected problem inherits the ill-conditioning of (1.1). That is, as the iterations 
proceed, the noise progressively enters the solution subspace, so that a small singular 
value of the projected problem appears and the regularized solution is deteriorated. 

As far as an iterative solver for solving (1.1) is concerned, a central problem is 
whether or not a pure iterative solver has already obtained a best possible regular¬ 
ized solution at semi-convergence, namely whether or not the regularized solution at 
semi-convergence is at least as accurate as x^^^^. As it appears, for Krylov sub¬ 
space based iterative solvers, their regularizing effects critically rely on how well the 
underlying fc-dimensional Krylov subspace captures the /c-dimensional dominant right 
singular subspace of A. The richer information the Krylov subspace contains on the 
A:-dimensional dominant right singular subspace, the less possible a small Ritz value 
of the resulting projected problem appears and thus the better regularizing effects 
the solver has. To precisely describe the regularizing effects of an iterative solver, 
we introduce the term of full or partial regularization: If the iterative solver itself 
computes a best possible regularized solution at semi-convergence, it is said to have 
the full regularization; in this case, no additional regularization is needed. Here, as 
defined in the abstract, a best possible regularized solution means that it is at least 
as accurate as the best regularized solution obtained by the truncated singular value 
decomposition (TSVD) method. Otherwise, it is said to have the partial regulariza¬ 
tion; in this case, in order to compute a best possible regularized solution, its hybrid 
variant, e.g., a hybrid LSQR, is needed that combines the solver with additional regu¬ 
larization [5, 13, 28, 30, 31, 32], which aims to remove the effects of small Ritz values, 
and expand the Krylov subspace until it captures all the dominant SVD components 
needed and the method obtains a best possible regularized solution. The study of 
the regularizing effects of LSQR and CGLS has been receiving intensive attention for 
years; see [20, 22] and the references therein. However, there has yet been no definitive 
result or assertion on their full or partial regularization. 

To proceed, we need the following definition of the degree of ill-posedness, which 
follows Hofmann’s book [24] and has been commonly used in the literature, e.g., 
[20, 22]: If there exists a positive real number a such that the singular values satisfy 
(Tj = then the problem is termed as mildly or moderately ill-posed if a < 1 or 

a > 1; if CTj = 0{e~°‘^) with a > 0 considerably, j = 1,2,..., n, then the problem is 
termed severely ill-posed. It is clear that the singular values aj of a severely ill-posed 
problem decay exponentially at the same rate p~^, while those of a moderately or 
mildly ill-posed problem decay more and more slowly at the decreasing rate (j+t) 
approaching one with increasing j, which, for the same j, is smaller for the moderately 
ill-posed problem than it for the mildly ill-posed problem. 

Other minimum-residual methods have also gained attention for solving (1.1). For 
problems with A symmetric, MINRES and its preferred variant MR-H are alternatives 
and have been shown to have regularizing effects [18]. When A is nonsymmetric 
and multiplication with A^ is difficult or impractical to compute, GMRES and its 
preferred variant RRGMRES are candidates [10, 30]. The hybrid approach based on 
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the Arnoldi process was first introduced in [ 11 ], and has been studied in [9, 11, 12 , 28]. 
Recently, Gazzola et al. [15, 16, 17, 31] have studied more methods based on the 
Lanczos bidiagonalization, the Arnoldi process and the nonsymmetric Lanczos process 
for the severely ill-posed problem (1.1). They have described a general framework of 
the hybrid methods and present Krylov-Tikhonov methods with different parameter 
choice strategies employed. 

In this paper, we focus on LSQR. We derive bounds for the 2-norm distance 
between the underlying fc-dimensional Krylov subspace and the fc-dimensional right 
singular space. There has been no rigorous and quantitative result on the distance 
before. The results indicate that the /c-dimensional Krylov subspace captures the 
A:-dimensional dominant right singular space better for severely and moderately ill- 
posed problems than for mildly ill-posed problems. As a result, LSQR has better 
regularizing effects for the first two kinds of problems than for the third kind. By the 
bounds and the analysis on them, we draw a definitive conclusion that LSQR gener¬ 
ally has only the partial regularization for mildly ill-posed problems, so that a hybrid 
LSQR with additional explicit regularization is needed to compute a best possible 
regularized solution. We also use the bounds to derive an estimate for the accuracy 
of the rank k approximation, generated by Lanczos bidiagonalization, to A, which is 
closely related to the regularization of LSQR. Our results help to further understand 
the regularization of LSQR, though they appear less sharp. In addition, we derive 
a bound on the diagonal entries of the bidiagonal matrices generated by the Lanc¬ 
zos bidigonalization process, showing how fast they decay. Numerical experiments 
confirm our theory that LSQR has only the partial regularization for mildly ill-posed 
problems and a hybrid LSQR is needed to compute best possible regularized solu¬ 
tions. Strikingly, the experiments demonstrate that LSQR has the full regularization 
for severely and moderately ill-posed problems. Our theory gives a partial support 
for the observed general phenomena. Throughout the paper, all the computation is 
assumed in exact arithmetic. Since CGLS is mathematically equivalent to LSQR, all 
the assertions on LSQR apply to CGLS. 

This paper is organized as follows. In Section 2, we describe the LSQR algorithm, 
and then present our theoretical results on LSQR with a detailed analysis. In Section 
3, we report numerical experiments to justify the partial regularization of LSQR for 
mildly ill-posed problems. We also report some definitive and general phenomena 
observed. Finally, we conclude the paper in Section 4. 

Throughout the paper, we denote by ICk{C,w) = span{w,Cw,... the 

A:-dimensional Krylov subspace generated by the matrix C and the vector w, by || • ||i? 
the Frobenius norm of a matrix, and by I the identity matrix with order clear from 
the context. 

2. The regularization of LSQR. LSQR for solving (1.1) is based on the Lanc¬ 
zos bidiagonalization process, which starts with pi = 6 /|| 6 || and, at step (iteration) 
A:, computes two orthonormal bases { 91 , < 72 , ■ ■ ■, 9 fe} and {pi,P 2 , ■ ■ ■ ,Pk} of the Krylov 
subspaces ICk{A'^A,A'^b) and ICk{AA^,b), respectively. 

Define the matrices Qk = (gi, 92 , • ■ ■, Qk) and Pk+i = (pi,P 2 , • ■ • ,Pk+i)- Then the 
A:-step Lanczos bidiagonalization can be written in the matrix form 

( 2 . 1 ) AQk = Pk+iBk, 

(2.2) = QkBk + Ofc+igfc+ie^_|_]^, 

where Ck+i denotes the (fc + l)-th canonical basis vector of and the quantities 
ai,i = 1, 2 ,..., fc -I- 1 and Pi, i = 2,..., fc + 1 denote the diagonal and subdiagonal 


SOME RESULTS ON REGULARIZATION OF LSQR 


5 


elements of the {k + 1) x k lower bidiagonal matrix Bj^, respectively. At iteration k, 
LSQR computes the solution with 

= argmin ||||&||ei - Bky\\. 
yeR'' 

Note that P^+ib = ||&||ei. We get 

(2.3) = Qfci/W = WbWQkBlei = 

As stated in the introduction, LSQR exhibits semi-convergence at some iteration: 
The iterates become better approximations to Xtme until some iteration k, and 
the noise will dominate the after that iteration. The iteration number k plays 
the role of the regularization parameter. However, semi-convergence does not neces¬ 
sarily mean that LSQR finds a best possible regularized solution as B^ may become 
ill-conditioned before k < ko but does not yet contain all the needed fcp dominant 
SVD components of A. In this case, in order to get a best possible regularized solu¬ 
tion, one has to use a hybrid LSQR method, as described in the introduction. The 
significance of (2.3) is that the LSQR iterates can be interpreted as the minimum- 
norm least squares solutions of the perturbed problems that replace A in (1.1) by 
its rank k approximations whose nonzero singular values are just those 

of Bk- If the singular values of Bk approximate the k large singular values of A in 
natural order for fc = 1, 2,..., ko, then LSQR must have the full regularization, and 
the regularized solution x^^“^ is best possible and is as comparably accurate as the 
best possible regularized solution by the TSVD method. 

Hansen’s analysis [20, p. 146] shows that the LSQR iterates have the filtered SVD 
expansions: 




Jk) _ 




CTi 


/T\ ^ {lA 

where f- — 1 — ^ , i = 1,2,... ,n, and 9) , j = 1,2,... ,k are the singular 

(k) (k) 

values of Bk- In our context, if we have 0), ' < ak„+i for some k < ko, the factors f- \ 
i = k + 1,... ,n are not small, meaning that is already deteriorated and becomes a 
poorer regularized solution, namely, LSQR surely does not have full regularization. As 
a matter of fact, in terms of the best possible solution , it is easily justified that 

the full regularization of LSQR is equivalent to requiring that the singular values of 
Bk approximate the k largest singular values of A in natural order for k = 1,2,... ,ko, 
so it is impossible to have < <Xka+i for ^ ^ ^o- 

The regularizing effects of LSQR critically depend on what ]Ck{A^A, A^h) mainly 
contains and provides. Note that the eigenpairs of A are the squares of singular val¬ 
ues and right singular vectors of A, and the tridiagonal matrix B^Bk is the projected 
matrix of A^A onto the subspace ICk{A^A, A^b), which is obtained by applying the 
symmetric Lanczos tridiagonalization process to A^A starting with qi = A^6/|| A^&|| 
[6]. We have a general claim deduced from [6, 34] and exploited widely in [20, 22]: 
The more information the subspace ICk{A'^A, A^b) contains on the k dominant right 
singular vectors, the more possible and accurate the k Ritz values approximate the k 
largest singular values of A; on the other hand, the less information it contains on the 
other n — k right singular vectors, the less accurate a small Ritz value is if it appears. 
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For our problem, since the small singular values of A are clustered and close to zero, 
it is expected that a small Ritz value will show up as k grows large, and it starts to 
appear more late when JCkiA"^ A, A^b) contains less information on the other n — k 
right singular vectors. In this sense, we say that LSQR has better regularizing effects 
since contains more dominant SVD components. 

Using the definition of canonical angles Q{X,y) between the two subspaces X 
and y of the same dimension [36, p. 250], we have the following theorem, which shows 
how well the subspace ICk{A^A, A^b), on which LSQR and CGLS work, captures the 
fc-dimensional dominant right singular space. 

Theorem 2.1. Let the SVD of A be (1.2), and assume that its singular values 
are distinet and satisfy Uj = with a > 0. Let Vk = span{Vk} be the subspace 

spanned by the eolumns of Vk = {vi,V 2 , ■ ■ ■ and Vf. = ICk{A^A,A^b). Then 

(2.4) ||sin0(Vfc,V^^)|| = 

x/l + l|Afc|P 

with the (n — k) X k matrix Ak to be defined by (2.6) and 


(2.5) \\Ak\\F< 


gfc+i \ujb\ 

Cfc min^^;i \ujb\ 


y'k{n-k){l + 0{e-^°)), ft = l,2 ,...,n-1. 


Proof. Let U = (ui, U 2 ,..., it^) consist of the first n columns of U defined in 
(1.2). We see ICk{T,^, SAA^b) is spanned by the columns of the n x ft matrix DTk with 

„2k-2 \ 

... a. \ 


D = diag (a^U'^b) , Tfc = 

Partition the matrices D and Tfc as follows: 

Di 0 


1 


V1 


2k-2 


^2k-2 


D = 


0 D2 


Tk = 


Tfci 

Tfc2 


where Di,Tki G Since Tki is a Vandermonde matrix with cfj distinct for 

^ A j A k, is nonsingular. Thus, by the SVD of A, we have 


lCk{A^ A,A^b) = span{V DTk} = span 


('(S 


Tfei 

2Tfe2 


= span < V 


L 

Ak 


with 

( 2 . 6 ) Ak = D2Tk2T^,^Df\ 

Define Zk = V ^ ^ . Then Z]fZk = / + A^Afc and the columns of ZkiZ}^Zk)~^ 

form an orthonormal basis of V|. 

Write V = (14,14"^). By definition, we obtain 

||sin0(Vfc,Vfc«)|| = {Vk^fZk{ZlZk)-"^\ 

iVk^fvf / 


= ||Afc(/ + ArAfc)-^|| = 


AT A 


k^k) 

IIAfcll 
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which proves (2.4) and indicates that || sin 0(Vfc, V^)|| is inonotonically increasing with 
respect to ||Afc||. 

We next estimate ||Afe||. We have 


Afeii < IIAII;^ = < P 2 II ||Tfe2r-'|L lli^r" 


(2.7) 


CTfe+i \vjb\ 


Tk2T, 


kl IIf ■ 


Tfc minj^^ Injfcl 

So we need to estimate ||7fe27fei^ ||_f- easily justified that the f-th column of 
consists of the coefficients of the Lagrange polynomial 

i,“’(A)= n 




that interpolates the elements of the i-th canonical basis vector S at the 
abscissas ai,a 2 , ■ ■ ■ ,<7^. Consequently, the i-th column of Tk2Tj7i is 

... ,Lf)(a^))^ , 


from which we obtain 


( 2 . 8 ) 


Tk2T^i = 


f ••• 

L[''\<71+2) L2\^1+2) Lf\<^l+2) 






Since |Lp^(A)| is monotonic for \ < a\, it is bounded by |L-^^(0)|. Furthermore, let 
l-^io^(0)l = niaxi=i_ 2 ,....fc |-^i^^(0)|. Then for i = 1, 2,..., fc and a > 0 we have 


iir^(o)i < ii';:^(o)i = n 

f=l 

io —1 

= n 




. 0-1 ^2 


n 


-I 


- T? 


1 


n 


j=l 3 j=io+l ®0 j 
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1 - C>(e“2(*o“j)“) C>(e2L-*o)“) - 1 

= 1 ' ' j=io + l ' ^ 

_i_ fr _ I _ 

ii I — Q(e-2i^o-j)a) 11 I _ Q(^-2{j-to)a\ k 

1=1 1 ' i=io+i ^ ^ n 


0 

io —1 

n 


1 


n C>(e2L-*o)“) 

i=io + l 


io \ ( fe—io + 1 ' 

1 + S 0(e-2J“) 1 + ^ 0(e-2J“) 


(2.9) 


i=i 


i=i 


n C>(e20-*o)a) 

i=io + l 


by absorbing those higher order terms into 0{-). Note that in the above numerator 
we have 


l + ^0(e-2^“) = l + 0 =l + of^--^(l-e-2*«“)ji 


i=i 


Af=i 
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k—io + l 




1+ = 1 + 0 e 

i=i \ i=i 


-2ia 1 _ 


It is then easily seen that their product is 


1 + 0 ( —^-^(1 - 

\ I _ ^-2.0. 


l+(9 


2e- 


1 — e 


-2a 


O 


1 — e 


-2a 


= 1^0 


2e- 


1 — e 


-2a 


= l + C>(e-2“). 


k / \ 2 k 

On the other hand, by definition, the denominator (^) = 11 

j=iQ + l ^ Z 

in (2.9) is exactly one for fo = fc, and it is strictly bigger than one for fo < k. Therefore, 


for any k, we have \L, 
(2.8) it follows that 




= niaxi=i^ 2 ,...,fe l-^i^^(0)l = 1 + 0(e ^“). From this and 


\'^k2T^l\\p < '/k 


rp rp — \ {k) 

^k2-^kl ^k 


< ^k{n-k)\LfJm = ^k{n-k){l + 0(e-2“)). 
Therefore, for i = 1, 2,..., n — 1 and a > 0 considerably, from (2.7) we have 

( 2 . 10 ) 


lAfell^ < + 0(e-2«)). 

Iwj h\ 


^k 


mm 


i=i 


Remark 2.1 We point out that (2.5) should not be sharp. As we have seen 

(Tfc+i |«f b| 

-- 


from the proof, the factor 


seems intrinsic and unavoidable, but the 


factor \/k{n — k) in (2.5) is an overestimate and can certainly be reduced. (2.10) is 

an overestimate since |L^^^(0)| for i not near to k is considerably smaller than |Tip^(0|, 
but we replace all them by their maximum l + 0{e~'^°‘). In fact, our derivation clearly 
illustrates that the smaller i is, the smaller |L^^^(0)| than |l[.^^(0)|. 

Recall the discrete Picard condition (1.4). Then 


( 2 . 11 ) 


Cfc = 


max^^fc+i \ujb\ 


idufS + u^el) crlt^ + 


K+i^\ 


min^^i |m^6| + uje\) 




We observe that Ck 

,T 


^fc + l 


that all the \u^h\ ~ almost remain the same. Thus, we have Ck 


Yipj < 1 almost remains constant for k < ko- For k > ko, note 

k 

T„i ru., TffQ have Ck ~ 1, meaning 

that V| does not capture Vk as well as it does for k < ko- 

Remark 2.2 The theorem can be extended to moderately ill-posed problems 
with the singular values Uj = 0(j““), a > 1 considerably and k not big since, in a 
similar manner to the proof of Theorem 2.1, we can obtain by the first order Taylor 
expansion 


-(k). 


k-1 


j=i V “ ^ 


i4"^(o)i=n 

k-1 

TT-^— 

1 - 0((|)2“) 


-I 


J = 1 



= 0 { 1 ), 


t=l 
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which, unlike {l+0{e ^“)) for severely ill-posed problems, depends on k and increases 
slowly with k for a > 1 considerably. However, for mildly ill-posed problems, from 
above we have |L-^^(0)| > 1 considerably for a < 1. 

Remark 2.3 A combination of (2.4) and (2.5) and the above analysis indicate 
that VI captures Vk better for severely ill-posed problems than for moderately ill- 
posed problems. There are two reasons for this. The first is that the factors ak+ijuk 
are basically fixed constants for severely ill-posed problems as k increases, and they 
are smaller than the counterparts for moderately ill-posed problems unless the degree 
a of its ill-posedness is far bigger than one and k small. The second is that the 
factor 0(1) is smaller for severely ill-posed problems than the factor 1 -|- 0{e~^°‘) for 
moderately ill-posed problems for the same k. 

Remark 2.4 The situation is fundamentally different for mildly ill-posed prob¬ 
lems: Firstly, we always have |Tip^(0)| > 1 substantially for a < 1 and any k, which 
is considerably bigger than 0(1) for moderately ill-posed problems for the same k. 
Secondly, Ck defined by (2.11) is closer to one than that for moderately ill-posed prob¬ 
lems for /c = 1, 2,..., fco- Thirdly, for the same noise level ||e|| and /3, we see from the 
discrete Picard condition (1.4) and the definition of fco that fcp is bigger for a mildly 
ill-posed problem than that for a moderately ill-posed problem. All of them show 
that captures Vk considerably better for severely and moderately ill-posed prob¬ 
lems than for mildly ill-posed problems for /c = 1, 2,..., /cp. In other words, our results 
illustrate that contains more information on the other n — k right singular vectors 
for mildly ill-posed problems, compared with severely and moderately ill-posed prob¬ 
lems. The bigger fc, the more it contains. Therefore, V| captures Vk more effectively 
for severely and moderately ill-posed problems than mildly ill-posed problems. That 
is, contains more information on the other n — k right singular vectors for mildly 
ill-posed problems, making the appearance of a small Ritz value more possible before 
fc < fcp and LSQR has better regularizing effects for the first two kinds of problems 
than for the third kind. Note that LSQR, at most, has the full regularization, i.e., 
there is no Ritz value smaller than (Jkg+i for /c < fcp, for severely and moderately 
ill-posed problems. Our analysis indicates that LSQR generally has only the partial 
regularization for mildly ill-posed problem and a hybrid LSQR should be used. 

Remark 2.5 Relation (2.5) and Ck indicate that captures Vk better for 
severely ill-posed problems than for moderately ill-posed problems. There are two 
reasons for this. First, the all the Ok+ijok are basically a fixed constant p~^ for 
severely ill-posed problems, which is smaller than those ratios for moderately ill- 
posed problems unless a is rather big and k small. Second, the quantities |L-^^(0)| = 
1 + 0{e~^°‘) for severely ill-posed problems are smaller than the corresponding Oil) 
for moderately ill-posed problems. 

Let us investigate more and get insight into the regularization of LSQR. Define 

( 2 . 12 ) -ik = \\A-Pk+iBkQl\\, 

which measures the quality of the rank k approximation Pk+iBkQ^ to A. Based on 
(2.5), we can derive the following estimate for jk- 

Theorem 2.2. Assume that (1.1) is severely or moderately ill posed. Then 

(2.13) CTfe+i < 7 fc < CTfc+i +CTi||sin0(Vfe,Vfc)||. 


Proof. Let Ak = UkV^kVj^ be the best rank k approximation to A with respect 
to the 2-norm, where Uk = (ui, •. • ,Ufc), 14 = ('Ci, ■ ■ ■, r’fe) and = diag(cri,..., crfc). 


10 


YI HUANG AND ZHONGXIAO JIA 


Since Pk+iBkQ"[ is of rank k, the lower bound in (2.13) is trivial by noting that 
7 fc ^ 11^ ~ ^fcll = Cfc+i- We now prove the upper bound. From (2.1), we obtain 

11 ^ ~ Pk+iBkQ'k\\ = 11^ ~ ^QfcQril ■ 

It is easily known that = ICk{A"^A, A'^b) = span{Qk} with Qk having orthonormal 

columns. Then by the definition of || sin0(Vfc, V^)|| we obtain 

= WiA-Uk^kV^ + Uk^kV;[)iI-QkQl)\\ 

< ||(A - Uk^kV^){I - QkQl)\\ + \\UkT.kV^{I - QkQl)\\ 

< o-fe+i + ||Sfc|| IjV)?’(I — QfcQfc)|| 

= cTfe+i + CTill sin0(Vfc, Vfc)||. □ 


Numerically, it has been extensively observed in the literature that the 7 ^ decay 
as fast as Uk+i and, more precisely, 7 ^ ~ Ok+i for severely ill-posed problems; see, e.g., 
[3, 16, 17]. They mean that the Pk+iBkQjl are very good rank k approximations to A. 
Recall that the TSVD method generates the best regularized solution b. 

As a result, if 'yko ~ ^feo-i-i; the LSQR iterate = QkoTkgQko+i^ reasonably 
close to the TSVD solution for (Tko+i is reasonably small. This means that 

LSQR has the full regularization and does not need any additional regularization to 
improve x^^°\ As our experiments will indicate in detail, these observed phenomena 
are of generality for both severely and moderately ill-posed problems and thus should 
have strong theoretical supports. Compared to the observations, our (2.13) appears 
to be a considerrable overestimate. 

We next present some results on Uk+i appearing in (2.2). If ak+i = 0, the Lanczos 
bidiagonalization process terminates, and we have found k exact singular triples of A 
[26]. In our context, since A has only simple singular values and b has components in 
all the left singular vectors, early termination is impossible in exact arithmetic, but 
small Ofe+i is possible. We aim to investigate how fast a^+i decays. We first give a 
refinement of a result in [17]. 

Theorem 2 . 3 . Let Bk = WkQkSl he the SVD of Bk, where Wk G M('=+i)x(fc+i) 
and Sk G are orthogonal, and Qk G and define Uk = Pk+iWk and 

Vk = QkSk- Then 


(2.14) 

(2.15) 


AVk - UkQk = 0 , 


A^Uk 



— Otk+l- 


Proof From (2.1) and Bk = WkQkS'^, we obtain 

AVk = AQkSk = Pk+iBkSk = UkQk- 
So (2.14) holds. From (2.2), we get 

X^Uk = A^Pk+iWk 

= QkBkWk + ak+iqk+iel^iWk 
— Qk^k^k 

= VkQk P ak+iQk+ie^+iWk- 
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Note that ||gfc+i|| = = 1- Then we get 


(2.16) 


A^Uk - Vk&l 


oik+i ||(?fc+iefc+iWfe|| = ak+i- □ 


We remark that it is an inequality other than the equality in a result of [17] similar 
to (2.15). 

In combination with the previous results and remarks, this theorem shows that 
once ak+i becomes small for not big fc, the k singular values of Bk may approximate 
the large singular values of A, and it is more possible that no small one appears for 
severely ill-posed problems and moderately ill-posed problems. 

As our final result, we establish an intimate and interesting relationship between 
ak+i and 7 fe, showing how fast Uk+i decays. 

Theorem 2.4. It holds that 


(2.17) 


Olk+l < Ik- 


Proof. With the notations as in Theorem 2.3, we have Pk+iBkQ^ = UkQkVjF. 
So, by (2.12), we have 


Ik = 


A - UkOkVk^ 


Note that U^Uk = I- Therefore, from (2.16) we obtain 


Olk+l = 


< 


A^Uk - Vk&l 
A^iJkUl - VkQliJl 
A^iikiil -VkQluIiikUl 


[A^'-VkQWr 


)UkU> 


A^-Vk&lu. 


T ' 
k , 


UkU, 


= A-UkOkVif =7fe. 


The theorem indicates that ak+i decays at least as fast as jk, which, in turn, 
means that ak+i may decrease in the same rate as Ck+i, as observed in [3, 16, 17] for 
severely ill-posed problems. 

3. Numerical experiments. In this section, we report numerical experiments 
to illustrate the the regularizing effects of LSQR. We will demonstrate that LSQR 
has the full regularization for severely and moderately ill-posed problems, stronger 
phenomena than our theory proves, but it only has the partial regularization for 
mildly ill-posed problems, in accordance with our theory, for which a hybrid LSQR 
is needed to compute best possible regularized solutions. We choose several ill-posed 
examples from Hansen’s regularization toolbox [21]. All the problems arise from the 
discretization of the first kind Fredholm integral equation 

(3.1) f K{s,t)x{t)dt = b{s), c<s<d. 

J a 
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For each problem we use the codes of [21] to generate a 1024 x 1024 matrix A, true 
solution Xtrue and noise-free right-hand b. In order to simulate the noisy data, we 
generate the Gaussian noise vector e whose entries are normally distributed with mean 
zero. Defining the noise level e = |||, we use e = 10“^, 10“^, 10“^, respectively, in 
the test examples. To simulate exact arithmetic, the full reorthogonalization is used 
during the Lanczos bidiagonalization process. We remind that, as far as ill-posed 
problem ( 1 . 1 ) is concerned, our primary goal consists in justifying the regularizing 
effects of iterative solvers, which are unaffected by sizes of ill-posed problems and only 
depends on the degree of ill-posedness. Therefore, for this purpose, as extensively 
done in the literature (see, e.g., [ 20 , 22 ] and the references therein), it is enough to 
test not very large problems. Indeed, for n large, say, 1,0000 and more, we have 
observed completely the same behavior as that for n not large, e.g., n = 1024 used in 
this paper. A reason for using n not large is because such choice makes it practical 
to fully justify the regularization effects of LSQR by comparing it with the TSVD 
method, which suits only for small and/or medium sized problems for computational 
efficiency. All the computations are carried out in Matlab 7.8 with the machine 
precision Cmach = 2.22 x 10“^® under the Microsoft Windows 7 64-bit system. 

3.1. Severely ill-posed problems. We consider the following two severely ill- 
posed problems [ 21 ]. 

Example 1 This problem ’Shaw’ arises from one-dimensional image restoration, 
and can be obtained by discretizing the first kind Fredholm integral equation (3.1) 
with [—f,f] as both integration and domain intervals. The kernel K{s,t) and the 
solution x(t) are given by 


K{s,t) 


(cos(s) -I- cos(t))^ 


sin(u) 

u 


2 


u = 7r(sin(s) -|- sin(t)). 


x{t) = 2 exp(—6(t — 0.8)^) -I- exp(—2(t -I- 0.5)^). 

Example 2 This problem ’Wing’ has a discontinuous solution and is obtained 
by discretizing the first kind Fredholm integral equation (3.1) with [0,1] as both 
integration and domain intervals. The kernel K{s,t), the solution x(t) and the right- 
hand side b{s) are given by 


K{s,t) = texp(—st^). 


b{s) 


exp(-^s) - exp(-|s) 
2s 


x{t) 


1 — < t < —■ 

1, g ^ I- ^ g, 

0 , elsewhere. 


These two problems are severely ill-posed, whose singular values Cj = 0{e~°‘^) 
with a = 2 for ’Shaw’ and a = 4.5 for ’Wing’, respectively. 

In Figure 1, we display the curves of the sequences 7 ^ and ak+i with e = 
10“^, 10“^, 10“"^, respectively. They illustrate that the quantities 7 *, decrease as fast 
as CTfe+i and both of them level off at the level of Cmach for k no more than 20 , and 
after that these quantities are purely round-offs and are reliable no more. Moreover, 
the curves of quantities ak+i always lie below those of 7 ^, which coincides with The¬ 
orem 2.4. We can see that the decaying curves with different noise levels are almost 
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(a) (b) 




(c) 


(d) 


Fig. 1. (a)-(b): Plots of decaying behavior of the sequences 7^., and ctk+i for the problem 

Shaw with e = 10“^ flcft) and e = 10“® (right); (c)-(d): Plots of decaying behavior of the sequences 
7 J; and for the problem Wing with e = 10“® (left) and e = 10“^ (right). 


the same. Furthermore, we observe that 7 ^ « ak+i for severely ill-posed problems, 
indicating that the Pp+iBkQ^ are very good rank k approximations to A with the ap¬ 
proximate accuracy (t^+i and that Bp does not become ill-conditioned before k < ko. 
As a result, the regularized solutions become better approximations to Xtme until 
iteration fcp, and they are deteriorated after that iteration. At iteration ko, only 
captures the ko dominant SVD components of A and suppress the other (n — ko) SVD 
components, so that it is a best possible regularized solution. As a result, the pure 
LSQR has the full regularization for severely ill-posed problems. We will give a more 
direct justification on these assertions in Section 3.3. 

In Figure 2, we plot the relative errors — a:tr.„e||/||a;true|| with different 
noise levels for these two problems. Obviously, LSQR exhibits semi-convergence phe¬ 
nomenon. Moreover, for smaller noise level, we get better regularized solutions at the 
cost of more iterations, as expected. 

3.2. Moderately ill-posed problems. We now consider the following two 
moderately ill-posed problems [ 21 ]. 

Example 3 This problem ’Heat’ arises from the inverse heat equation, and 
can be obtained by discretizing Volterra integral equation of the first kind, a class of 
equations that is moderately ill-posed, with [0,1] as integration interval. The kernel 
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Fig. 2. The relative errors — 3;t7-ue||/||3;triie|| with respect to e = 10 ^,10 ^,10 ^ for 

the problems Shaw (left) and Wing (right). 


K{s, t) = k{s — t) with 

, , , t-3/2 / 1 \ 

Example 4 This problem is the famous Phillips’ test problem. It can be 
obtained by discretizing the first kind Fredholm integral equation (3.1) with [—6,6] 
as both integration and domain intervals. The kernel the solution x{t) and 

the right-hand side &(s) are given by 



From Figure 3, we see that jp decreases as fast as CTfe+i, and ak+i decays as fast 
as 7 fc. However, slightly different from severely ill-posed problems, we can observe 
that the jp may not be so close to the Uk+i, as reflected by the thick rope formed by 
three lines. By comparing the behavior of 7 fc for severely and moderately ill-posed 
problem, we come to the conclusion that the /c-step Lanczos bidiagonalization may 
generate more accurate rank k approximation for severely ill-posed problems than for 
moderately ill-posed problems, namely, the rank k approximation Pp+iBkQk may be 
more accurate for severely ill-posed problems than for moderately ill-posed problems. 
Nonetheless, we have seen that, for the test moderately ill-posed problems, all the 
7 fc are still excellent approximations to the ctk+i, so that LSQR still has the full 
regularization. 

In Figure 4, we depict the relative errors of , and observe analogous phenomena 
to those for severely ill-posed problems. A distinction is that now LSQR needs more 
iterations for moderately ill-posed problems with the same noise level. 













Relative error 


SOME RESULTS ON REGULARIZATION OF LSQR 


15 



(a) (b) 



(c) 


(d) 


Fig. 3. (a)-(b): Plots of decaying behavior of the sequences 7 ^, crfc+i foT the problem 

Heat with e = 10“^ ^ = 10“^ (right); (c)-(d): Plots of decaying behavior of the sequences 

7 fc and crfc+i problem Phillips with e = 10“^ £ = 10““^ (right). 




(a) (b) 

Fig. 4. The relative errors — 3:trite|| /ll^Jiriiell with respect to e = 10“^, 10“^, 10“'^ for 

the problems Heat (left) and Phillips (right). 
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3.3. Comparison of LSQR with and without additional TSVD regular¬ 
ization. For the previous four severely and moderately ill-posed problems, we now 
compare the regularizing effects of the pure LSQR and the hybrid LSQR with the ad¬ 
ditional TSVD regularization used within projected problems. We show that LSQR 
has the full regularization and no additional regularization is needed, which is based 
on the observation that at semi-convergence the regularized solution by LSQR is as 
accurate as that obtained by the hybrid LSQR for each problem. 

In the sequel, we only report the results for the noise level e = 10“^. Results for 
other e are analogous and thus omitted. 

Figures 5 (a)-(b) and Figures 6 (a)-(b) indicate that the relative errors of ap¬ 
proximate solutions obtained by the two methods reach the same minimum level, and 
the hybrid LSQR simply stabilizes the regularized solutions with the minimum error. 
This means that the pure LSQR itself has already found a best possible regularized 
solution at semi-convergence and no additional regularization is needed. So it has 
the full regularization. Our task is to determine such k, which is the iteration where 
|| 2 ;(fc-i-i) II starts to increase dramatically while its residual norm remains almost un¬ 
changed. The L-curve criterion fits nicely into this task. In these examples, we also 
choose Xreg = argmiiifc ||a;^^^ — Xiruell for the pure LSQR. Figure 5 (c) and Figures 6 
(c)-(d) show that the regularized solutions are generally very good approximations to 
the true solutions. However, we should point out that for the problem ’Wing’ with a 
discontinuous solution, the large relative error indicates that the regularized solution 
is a poor approximation to the true solution, as depicted in Figure 5 (d). Such phe¬ 
nomenon is due to the fact that the regularization of LSQR and its hybrid variants 
is unsuitable for the ill-posed problems with discontinuous solutions. For such kind 
of problems, more reasonable regularization is Total Variation Regularization, which 
takes the form mina;gRn \\Ax — 6|p -|- A^||La;||f with L ^ I some pxn matrix and || • ||i 
the 1-norm [22]. 

In what follows, we compare the regularizing effects of the pure LSQR and hybrid 
LSQR for mildly ill-posed problems, showing that LSQR has only the partial regu¬ 
larization and a hybrid LSQR should be used for this kind of problem to improve the 
regularized solution by LSQR at semi-convergence. 

Example 5 The problem ’deriv2’ is mildly ill-posed, which is obtained by dis¬ 
cretizing the first kind Fredholm integral equation (3.1) with [0,1] as both integration 
and domain intervals. The kernel K(s^ t) is Green’s function for the second derivative: 



s <t; 
s >t, 


and the solution x(t) and the right-hand side b(s) are given by 



Figure 7 (a) shows that the relative errors of approximate solutions by the hybrid 
LSQR reach a considerably smaller minimum level than those by the pure LSQR, a 
clear indication that LSQR has the partial regularization. As we have seen, the hybrid 
LSQR expands the Krylov subspace until it contains enough dominant SVD compo¬ 
nents and, meanwhile, additional regularization effectively dampen the SVD compo¬ 
nents corresponding to small singular values. For instance, the semi-convergence of 
the pure LSQR occurs at iteration k = 3, but it is not enough. As the hybrid LSQR 
shows, we need a larger six dimensional Krylov subspace ICe{A'^A, A^b) to construct 


Relative error 
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(C) (d) 

Fig. 5. (a)-(b): The relative errors — a:true|| with respect to LSQR and LSQR 

with additional TSVD regularization for e = 10“^; (c)-(d): The regularized solutions Xreg for the 
pure LSQR for the problems Shaw (left) and Wing (right). 


a best possible regularized solution. We also choose Xreg = arginine — a;triie|| 
for the pure LSQR and the hybrid LSQR. Figure 7 (b) indicates that the regularized 
solution obtained by the hybrid LSQR is a considerably better approximation to Xtme 
than that by the pure LSQR, especially in the non-smooth middle part of Xtme- 

4. Conclusions. For large-scale discrete ill-posed problems, LSQR and CGLS 
are commonly used methods. These methods have regularizing effects and exhibit 
semi-convergence. However, if a small Ritz value appears before the methods cap¬ 
ture all the needed dominant SVD components, the methods have only the partial 
regularization and must be equipped with additional regularization so that best pos¬ 
sible regularized solutions can be found. Otherwise, LSQR has the full regularization 
and can compute best possible regularized solutions without additional regularization 
needed. 

We have proved that the underlying fc-dimensional Krylov subspace captures the 
k dimensional dominant right singular space better for severely and moderately ill- 
posed problems than for mildly ill-posed problems. This makes LSQR have better 
regularization for the first two kinds of problems than for the third kind. Furthermore, 
we have shown that LSQR generally has only the partial regularization for mildly 
ill-posed problems. Numerical experiments have demonstrated that LSQR has the 
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(a) (b) 




(c) 


(d) 


'.SQ 


Fig. 6 . (a)-(b): The relative errors — a?tri 4 e||/||a:triie|| obtained by the pure LSQR and 

R with the additional TSVD regularization for e = 10“^; (c)-(d): The regularized solutions 
for the pure LSQR for the problems Heat (left) and Phillips (right). 




(b) 


Fig. 7. The relative errors — a^truejl/H^^truell o,nd the regularized solution Xreg with 

respect to LSQR and LSQR with the additional TSVD regularization for the problem Deriv2 and 

£ = 10 - 3 . 
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full regularization for severely and moderately ill-posed problems, stronger than our 
theory predicts, and it has the partial regularization for mildly moderately ill-posed 
problems, compatible with our assertion. Together with the observations [3, 16, 17], it 
appears that the excellent performances of LSQR on severely and moderately ill-posed 
problems generally hold. 

As for future work, it is more appealing to derive an accurate estimate for ||Afc|| 
other than ||Afc||F, as it plays a crucial role in analyzing the accuracy 7 ^ of the rank 
k approximation, generated by Lanczos bidiagonalization, to A. Accurate bounds for 
are the core of completely understanding the regularizing effects of LSQR, but our 
bound ( 2 . 12 ) for 7 ^ is conservative and is expected to be improved on substantially. 
Since CGLS is mathematically equivalent to LSQR, our results apply to CGLS as 
well. Our current work has helped to better understand the regularization of LSQR 
and CGLS. But for a complete understanding of the intrinsic regularizing effects of 
LSQR and CGLS, we still have a long way to go, and more research is needed. 

Acknowledgements, We thank the three referees very much for their valuable 
suggestions and comments, which made us improve the presentation of the paper. 
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